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Abstract 



In this paper, we outline the theory of epidemic percolation networks and their 
use in the analysis of stochastic SIR epidemic models on undirected contact net- 
works. Wc then show how the same theory can be used to analyze stochastic 
SIR models with random and proportionate mixing. The epidemic percolation 
networks for these models are purely directed because undirected edges disap- 
pear in the limit of a large population. In a series of simulations, we show 
that epidemic percolation networks accurately predict the mean outbreak size 
and probability and final size of an epidemic for a variety of epidemic models in 
homogeneous and heterogeneous populations. Finally, we show that epidemic 
percolation networks can be used to re-derive classical results from several dif- 
ferent areas of infectious disease epidemiology. In an appendix, we show that an 
epidemic percolation network can be defined for any time-homogeneous stochas- 
tic SIR model in a closed population and prove that the distribution of outbreak 
sizes given the infection of any given node in the SIR model is identical to the 
distribution of its out-component sizes in the corresponding probability space 
of epidemic percolation networks. We conclude that the theory of percolation 
on semi-directed networks provides a very general framework for the analysis of 
stochastic SIR models in closed populations. 



1 Introduction 



In an important paper, M. E. J. Newman studied a network-based Susceptible- 
Infectious-Rcmovod (SIR) epidemic model in which infection is transmitted 
through a network of contacts between individuals [1]. The contact network 
itself was a random undirected network with an arbitrary degree distribution 
of the form studied by Newman, Strogatz, and Watts [2]. Given the degree 
distribution, these networks are maximally random. Thus, they have no small 
loops in the limit of a large population [2-4] . 

In the SIR model from [1], the probability that an infected node i makes 
infectious contact with an adjacent node j is given by = 1 — exp(— ri/3jj), 
where (3ij is the rate of infectious contact from i to j and is the time that i 
remains infectious. (In this paper, we use infectious contact to mean a contact 
that results in infection if and only if the recipient is susceptible.) The recovery 
period is a random variable with the cumulative distribution function (cdf) 
F(r) and the infectious contact rate Pij has the cdf F{(}). The infectious 
periods for all individuals are independent and identically distributed (iid) and 
the infectious contact rates for all ordered pairs of individuals are iid. 

This model can be analyzed by mapping the SIR model onto a semi-directed 
network that we call the epidemic percolation network [5] . Since the distribution 
of recovery periods for all nodes and the joint distribution of contact rates 
for all pairs of connected nodes are defined a priori, all relevant transmission 
probabilities can be determined by assigning the infectious periods and contact 
rate pairs before an epidemic begins. Starting from the contact network, a single 
realization of the epidemic percolation network can be generated as follows: 

1. Choose a recovery period for every node i and choose a contact rate f3ij 
for every ordered pair of connected nodes i and j in the contact network. 

2. For each pair of connected nodes i and j in the contact network, convert 
the undirected edge between them to a directed edge from i to j with 
probability 

(1 - e-'-»ft,)g-r,ft-,^ 

to a directed edge from j to i with probability 

e-'-.ft.(i_e-'-^/3i^), 

and erase the edge completely with probability exp(— rj/Jy — rj(3ji). The 
edge remains undirected with probability 

(l_e-'-'ft^)(l-e-''^'^^0- 

The epidemic percolation network is a semi-directed network that represents 
a single realization of the infectious contact process for each connected pair of 
nodes, so 4™ possible epidemic percolation networks exist for a contact net- 
work with m edges. The probability of each network is determined by the 
underlying SIR model. The epidemic percolation network is very similar to the 
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locally dependent random graph defined by Kuulasmaa [6] for an epidemic on 
a d-dimensional lattice, with two important differences: First, the underlying 
structure of the contact network is not assumed to be a lattice. Second, we 
replace pairs of (occupied) directed edges between two nodes with a single undi- 
rected edge. The idea of the epidemic percolation network is also similar to 
the idea of forward and backward branching processes, which have been used to 
derive the probability and final size of an epidemic, respectively, in SIR models 
with independent infectiousness and susceptibility [7]. The epidemic percola- 
tion network can be thought of as a simultaneous mapping of the forward and 
backward branching processes that generalizes to models with arbitrary joint 
distributions of infectiousness and susceptibility. (The relationship between 
epidemic percolation networks and branching processes is discussed further in 
Section 5.1.) 

In the Appendix, we define epidemic percolation networks for a very gen- 
eral time-homogeneous stochastic SIR epidemic model (which includes network- 
based models and models with random and proportionate mixing as special 
cases) and prove that the size distribution of outbreaks starting from node i 
is identical to the distribution of its out-component sizes in the corresponding 
probability space of percolation networks. Because of this equality of distribu- 
tion, epidemic percolation networks can be used to analyze a much more general 
class of epidemic models than that defined in the Introduction. In this paper, 
we show how they can be used to analyze stochastic SIR epidemic models with 
random or proportionate mixing. 

1.1 Structure of semi-directed networks 

In this subsection, we review the structure of directed and semi-directed net- 
works as discussed in [3,4,8,9]. Reviews of the structure and analysis of 
undirected and purely directed networks can be found in [10-13]. 

The indegree and outdegree of node i are the number of incoming and out- 
going directed edges incident to i. Since each directed edge is an outgoing edge 
for one node and an incoming edge for another node, the mean indegree and 
outdegree of a semi-directed network are equal. The undirected degree of node 
i is the number of undirected edges incident to i. 

A component is a maximal group of connected nodes. The size of a com- 
ponent is the number of nodes it contains and its relative size is its size divided 
by the total size of the network. There are four types of components in a 
semi-directed network. 

The out- component of node i includes i and all nodes that can be reached 
from i by following a series of edges in the proper direction (undirected edges 
are bidirectional). The in-component of node i includes i and all nodes from 
which i can be reached by following a series of edges in the proper direction. 
By definition, node i is in the in-component of node j if and only if j is in the 
out-component of i. Therefore, the mean size of in- and out-components in any 
semi-directed network must be equal. 

The strongly- connected component of a node i is the intersection of its in- 
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and out-components; it is the set of all nodes that can be reached from node 
i and from which node i can be reached. All nodes in a strongly-connected 
component have the same in-component and the same out-component. The 
weakly-connected component of node i is the set of nodes that are connected to 
i when the direction of the edges is ignored. 

For giant components, we use the definitions given in [9, 14]. Giant compo- 
nents are so called because they have asymptotically positive relative size in the 
limit of a large population. All other components are "small" in the sense that 
they have asymptotically zero relative size. There are two phase transitions in 
a semi-directed network: One where a unique giant weakly-connected compo- 
nent (GWCC) emerges and another where unique giant in-, out-, and strongly- 
connected components (GIN, GOUT, and GSCC) emerge. The GWCC contains 
the other three giant components. The GSCC is the intersection of the GIN 
and the GOUT, which are the common in- and out-components of nodes in the 
GSCC. Tendrils are components in the GWCC that are outside the GIN and 
the GOUT. Tubes are directed paths from the GIN to the GOUT that do not 
intersect the GSCC. All tendrils and tubes are small components. A schematic 
representation of these components is shown in Figure [TJ 

1.2 Epidemic percolation networks and epidemics 

An outbreak begins when one or more nodes are infected from outside the popu- 
lation. These are called imported infections. The final size of an outbreak is the 
number of nodes that are infected before the end of transmission, and its relative 
final size is its final size divided by the total size of the network. The nodes 
infected in the outbreak can be identified with the nodes in the out-components 
of the imported infections. This identification is made mathematically precise 
in the Appendix. 

We define a self-limited outbreak to be an outbreak whose relative final size 
approaches zero in the limit of a large population. An epidemic is an outbreak 
whose relative final size is positive in the limit of a large population. For 
many SIR epidemic models (including the one in the Introduction), there is an 
epidemic threshold: The probability of an epidemic is zero below the epidemic 
threshold and the probability and relative final size of an epidemic are positive 
above the epidemic threshold [1, 7, 15, 16]. 

If all out-components in the epidemic percolation network are small, then 
only self-limited outbreaks are possible. If the epidemic percolation network 
contains a GSCC, then any infection in the GIN will lead to the infection of the 
entire GOUT. Therefore, the epidemic threshold corresponds to the emergence 
of the GSCC in the epidemic percolation network. For any set of imported 
infections, the probability of an epidemic is equal to the probability that at 
least one imported infection occurs in the GIN. For any finite set of imported 
infections, the relative final size of an epidemic is asymptotically equal to the 
proportion of the network contained in the GOUT. Although some nodes out- 
side the GOUT may be infected (e.g. tendrils and tubes), they will constitute 
a finite number of small components whose total relative size is asymptotically 
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zero. 

This argument can be extended to epidemic percolation networks for hetero- 
geneous populations. The size distribution of outbreaks starting from an initial 
infection in any given node i is cqiial to the distribution of the out-component 
sizes of node i in the probability space of epidemic percolation networks. In the 
limit of a large population, the probability that the infection of node i causes an 
epidemic is equal to the probability that i is in the GIN and the probability that 
i is infected in an epidemic is equal to the probability that i is in the GOUT. 
Note that the size distribution of outbreaks and the probability of an epidemic 
can depend on the initial infection(s), but the relative final size of an epidemic 
does not. 

1.3 Random and proportionate mixing 

In this paper, we show how epidemic percolation networks can be used to analyze 
stochastic SIR epidemic models with random or proportionate mixing. Methods 
exist to calculate the final size distribution of epidemics for such models in a 
population of size n, but they require solving a recursive system of n equations 
[15, 17]. Wc will show how the size distribution of outbreaks, the epidemic 
threshold, and the probability and relative final size of a large epidemic can be 
calculated in the limit of large n by solving a much simpler set of equations. 
These methods also generalize more easily to heterogeneous populations. We 
will show that these methods are equivalent to branching processes when the 
indegree and outdegree of the epidemic percolation network are independent, 
and we will use them to re-derive classical results from several areas of theoretical 
infectious disease epidemiology. 

The rest of the paper is organized as follows: In Section 2, we find the de- 
gree distributions of the epidemic percolation networks corresponding to SIR 
models with random and proportionate mixing. In Section 3, wc review the 
use of probability generating functions to analyze semi-directed networks and 
show how these simplify in the case of purely directed networks. In Section 4, 
we present a series of simulations to show that epidemic percolation networks 
accurately predict the mean outbreak size and the probability and final size of 
an epidemic for SIR models with random and proportionate mixing. In Sec- 
tion 5, we show that epidemic percolation networks with independent indegree 
and outdegree are equivalent to (forward and backward) branching processes 
and re-derive classical results from the epidemiology of sexually transmitted dis- 
eases, vector-borne diseases, and controlled diseases. In the Appendix, we show 
that an epidemic percolation network can be defined for any time-homogeneous 
stochastic SIR epidemic model in a closed population and prove that the dis- 
tribution of outbreaks starting from a node i is equal to the distribution of its 
out-component sizes in the corresponding probability space of epidemic perco- 
lation networks. We conclude that the theory of percolation on semi-directed 
networks provides a very general framework for the analysis of stochastic SIR 
epidemic models in closed populations. 
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2 Epidemics with random mixing 



In this section, we derive probability generating functions for the degree dis- 
tributions of epidemic percolation networks corresponding to SIR models with 
random mixing and proportionate mixing. The most important difference be- 
tween epidemic models with random mixing and network-based models is that 
the infectious contact rate between any pair of nodes is inversely proportional 
to the population size. [7, 15, f 8]. We deal first with random mixing and intro- 
duce proportionate mixing as a generalization. To introduce random mixing, 
we modify the epidemic model from the Introduction in three ways: 

1. The contact network is always a complete graph, so infection can be trans- 
mitted between any two individuals. 

2. We relax that assumption (from [1,5]) that (3ij and Pji are iid. Instead, 
we let Pij and j3ji have a joint distribution F{Pij , (3ji) that is symmetric in 
its arguments (i.e. F(/3i,/32) = F{(32,(3i) for all fi\,(32)- This symmetry 
forces the joint distribution of contact rates between any two individuals 
to be independent of the indices assigned to them. 

3. In a population of size n, the contact rate from i to j (3ij (n — 1)~^ and the 
contact rate from j to i is f3ji{n — 1)^^, where /3,j and Pji have the joint 
distribution F{(iij,j3ji) from above. 

The epidemic percolation network for a random mixing model is defined 
in the same way as that for a network-based model, except that (3ij{n — 
replaces j3ij. Let g{x,y,u\n,ri,Pij;rj, (3ji) be the conditional probability gen- 
erating function (pgf) for the number of incoming, outgoing, and undirected 
edges incident to node i that appear between i and j in the epidemic perco- 
lation network given n, ri, /3,j, rj, and (3ji. Then g{x,y,u\n,ri, (3ij;rj, (3ji) 
is 

exp( i + exp( ^) 1 - exp( — ^) a; 

+ [1 - exp(-^)] exp(-^)y + [1 - exp(-^)][l - exp(-i:^)]«. 

n—1 n—1 n—1 n—1 

In the limit of large n, 

g{x,y,u\n,n,(3if,rj,(3ji) = 1 + + '-y + o{n 

n—1 n—1 n—1 

so undirected edges disappear in the limit of a large population. Given and 
n, the conditional pgf for the number of incoming, outgoing, and undirected 

edges incident to i that appear between i and a node j ^ i can be found by 
integrating over the distribution of possible and r^: 

fOO fOO fOO 

g{x,y,u\n,ri) = / / g{x,y,u\(3,ri,rj,n)dF{Pij, (3ji)dF{rj) 
^ 1 ^ E[r]E[j3]{x-l)+ r.,E[l3]{y-l) ^ 
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Given and n, the conditional pgf for the total number of incoming and out- 
going edges incident to i in the percolation network is 

(1 + E[r]E[p]{x-^) + nElP]iy-i) ^ 

In the limit of large n, this converges to 

G{x,y,u\ri) = e''i^^''Wi^-')+r.Emiy-i) _ 

Therefore, the pgf for the degree distribution of the percolation network in the 
limit of a large population is 

POO 

Jo 

Several results follow immediately from inspection of this function: First, 
undirected edges vanish in the limit of a large population, leaving a purely 
directed epidemic percolation network. Second, the indegrce and outdcgrce 
of nodes in the epidemic percolation network are independent. Third, the 
indegree has a Poisson distribution with mean E[r]E[(3]. Finally, the outdegree 
has a conditional Poisson distribution for any given recovery period r. The 
mean outdegree is £'[r]£'[/?] as required, but the outdegree distribution is not 
necessarily Poisson. For example, if r ~ exponential(A), then the outdegree has 
a geometric(-jq:^pj) distribution. More generally, if r ~ gamma(a. A), then the 

outdegree has a negative binomial(a, x+^pj) distribution. 
2.1 Proportionate mixing 

A useful generalization of the SIR model with random mixing is to allow the 

population to be composed of K distinct subpopulations, where each subpopu- 
lation k constitutes a proportion Wk of the overall population. Let Fk (r) be the 
cumulative distribution function for the recovery period of nodes in subpopula- 
tion k. In addition, let each subpopulation k have a relative infectiousness ak 
and a relative susceptibility 7^. If nodes i and j are in subpopulations ki and 
kj, then the infectious contact rate from i to j is ak^Pij^kj and the infectious 
contact rate from j to i is Uk^Pji^ki, where [3ij and (3ji have a joint distribution 
function F{Pij,(3ji) as before. This formulation for an epidemic model with 
a heterogeneous population is called proportionate mixing [7, 15]. Since the 
relative infectiousness and relative susceptibility are each determined only up 
to a multiplicative constant, we assume without loss of generality that 

K K 

^ WkE[rk]ak = ^ Wklk = 1- 

Let gkikj{x,y,u\n,ri, Pijirj, (3ji) be the conditional pgf for the number of 
incoming, outgoing, and undirected edges incident to i that appear between 
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nodes i and j in subpopulations ki and kj in the pereolation network given f3ij , 
Pji, Ti, Tj, and n. Then guik, {x, y, u\n, f3if,rj , f3ji) equals 

J- + a; + y + o(n j. 

n— i n— 1 n — 1 

Let (7fe. {x,y,u\ri,n) be the conditional pgf for the number of incoming, outgoing, 
and undirected edges incident to i that appear between i and a node j i given 
ki, Vi, and n. Then 

^ f-OC /-oo f-OO 

gki{x,y,u\ri,n) = y2 / / / Wkj9kikj{x,y,u\(3ij, f3ji,ri,rj,n)dF{l3ij, Pji)dFk^{rj) 
k.^i^o Jo Jo 

_^ , E[rk,]ak,E[P]jkAx-l) + nak^E[p]jk.{y-l) , ^ 

— 2^ Wfe^ [J- -I- n - 1 +01'^ jj- 

fc,=i 

The conditional pgf for the total number of incoming, outgoing, and undirected 
edges incident to node i given rj and n is 

-Q ^ E[rk]akE[P]jkAx ^ 1) + r,ak,E[P]jk{y - 1) ^ 
fc=i ^ 
In the limit of large n, this becomes 

gk,{x,y,u\ri) = e'"^[-^I''^]°^-^[^l^^i(^-i)+''*"^i-^['^l^^(^-i)l 
fc=i 

= gB[/3]7^(a:-l)+riafc,£;[/3](j/-l)_ 

Integrating over the distribution of infectious periods in subpopulation ki yields 

/•oo 

Gki{x,y,u) = / gk,ix,y,u\ri)dFk^{ri) 
Jo 

f-OC 

^0 

Note that the indegree distribution of subpopulation ki has a Poisson distribu- 
tion, the outdcgrcc distribution is a mixture of Poisson distributions, and the 
indegree and outdegree are independent within subpopulation ki. Finally, the 
pgf for the degree distribution of the epidemic percolation network in the limit 
of a large population is: 

K 

G{x,y,u) = ^WkGk{x,y,u). 
fe=i 

The proportional mixing assumption has two important consequences that 
enormously simplify the analysis of the epidemic percolation network: First, the 
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probability that an edge terminates at a node in subpopulation k is proportional 
to the expected indegree of subpopulation k and independent of the node at 
which the edge began. Second, the probability that an edge originates at a node 
in subpopulation k is proportional to the expected outdegrec of subpopulation 
k and independent of the node at which the edge terminates. To prove the first, 
we observe that the total number of directed edges to nodes in subpopulation 
fco from a node i in subpopulation ki with recovery period is a sum of WkoTi 
iid Bernoulli random variables with mean 

n—l 

In the limit of large n, the number of outgoing edges from node i to nodes in 
subpopulation fco has a Poisson distribution with mean 

The total number of outgoing edges from node i has a Poisson distribution with 

mean r^afc. £■[/?], which is a sum of K Poisson random variables with means 
Wk{riakiE[j3]^k)i k = 1,...,K. By the strong law of large numbers, the pro- 
portion of these edges that terminate at nodes in subpopulation fco converges 
almost surely to Wk(,"/ka, which is proportional to the expected indegree of sub- 
population fco and independent of ki and r^. A similar argument shows that 
the proportion of incoming edges to node i that originate at nodes in subpopu- 
lation fco converges almost surely to Wkf,E[rkf,]ako^ which is proportional to the 
expected outdegree of subpopulation fco and independent of ki and rj. 

3 Components of epidemic percolation networks 

Methods of calculating the size distribution of small components, the percolation 
threshold, and the proportion of a network contained in the GIN, the GOUT, 
and the GSCC for semi-directed networks with arbitrary degree distributions 
have been developed by Boguiia and Serrano [3] and Meyers, Newman, and 
Pourbohloul [4]. For purely directed and purely undirected networks, these 
methods simplify to equations derived by Newman, Strogatz, and Watts [2, 10- 
12]. In this section, we outline the methods for semi-directed networks and show 
how they simplify in the case of purely directed networks. This discussion is 
adapted from our previous paper [5] and introduces notation that will be used 
in the rest of this paper. For readers who desire an introduction to random 
graphs and percolation on networks, we recommend Albert and Barabasi [10] 
and Newman [12]. 

The networks considered here have no small loops and no two-point degree 
correlations (i.e. the degree of a node reached by following an edge forward or 
backward is independent of the degree of the node from which we start). As 
shown above, this is sufficient for models with random and proportionate mixing. 
Since these methods assume no clustering of contacts, they do not apply to 
epidemics on networks with spatial structure [6, 16], small- world networks [19], 
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or other clustered networks [20,21]. The development of methods for clustered 
networks is an area of active research [22]. Nonetheless, the isomorphism to 
an epidemic percolation network is valid for any time-homogeneous SIR model, 
including models that cannot be analyzed via the generating function formalism 
outlined here. 

If a, b, and c are nonnegative integers, let G^°'''''''^{x,y,u) be the derivative 
obtained after differentiating a times with respect to .t, b times with respect 
to y, and c times with respect to u. Then the mean indegree of the epidemic 
percolation network is G^^'^'^^{1, 1, 1) and the mean outdegreeis G(°'i'°)(l, 1, 1). 
Let (fed) denote the common mean of the directed degrees. The mean undirected 
degree is (fc„) = G^^'°'^\l, 1, 1). For the epidemic percolation network for the 
homogeneous SIR model with random mixing, (kd) = E[r]E[f3] and = 0. 

Let Gf{x,y,u) be the pgf for the degree distribution of a node reached 
by going forward along a directed edge, excluding the edge used to reach the 
node. Since the probability of reaching any node by following a directed edge 
is proportional to its indegree, 



Similarly, the pgf for the degree distribution of a node reached by going in 
reverse along a directed edge (excluding the edge used to reach it) is 



and the pgf for the degree distribution of a node reached by following an undi- 
rected edge (excluding the edge used to reach it) is 



The above definitions require that (ka) > and (ku) > 0. In a purely undirected 
network (i.e. (kd) = 0), we arbitrarily set Gf{x,y,u) = Gr{x,y,u) = 1 for all 
X, y, and u. In a purely directed network (i.e. (fc„) = 0), we arbitrarily set 
Gu{x,y,u) = 1 for all x, y, and u. 

3.1 Out-components 

Let HJ^*{z) be the pgf for the size of the out-component at the end of a directed 

edge and H°'"^{z) be the pgf for the size of the out-component at the "end" of 
an undirected edge. Then, in the limit of a large population. 




Gr{x,y,u) = -^G^°^''°\x,y,u). 



Gu{x,y,u) = 



1 



G(°'°'i)(x,y,n). 



(ku) 



H'p\z) = zGfil,H"f-'iz),H:-'iz)), 
HZ^\z) = zGu{1,H]"\z),H:-\z)). 



(2a) 
(2b) 



The pgf for the out-component size of a randomly chosen node is 



H^'^^z) = zG{l,Hp\z),H°''\z)). 



(3) 
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The coefficients on in and HZ''\z) are G/(1,0,0) and G'„(1,0,0) 

respectively. Therefore, power series for and H°'^*{z) can be 

TOut(^\ „„J If out/ 



In a purely directed network, H°^^{z) = 1 for all z because Gu{x,y,u) = 
1 for all X, y, and u. Thus = zG/(l, 1) and = 

^G(1,F;"*(z),1). 

Given power series for HJ^*-{z) and H!^*-{z) that are accurate to z", equa- 
tions (Pa|) and (j2b[) can be used to obtain series that are accurate to 
With power series for and i?°"*(2;) that are accurate to z", equation 

([31) can be used to obtain a power series for that is accurate to 2;"+^. 

computed to any desired order. For any z G [0, 1], H'^^*{z) and H°^^{z) can be 
calculated with arbitrary precision by iterating equations (|2a)) and ((2b)) starting 
from initial values yo,uo G [0,1). Estimates of HJ"'^{z) and H!^^*-{z) can be 
used to obtain estimates of H°'^*{z) with arbitrary precision. 

In the limit of a large population, the probability that a node has a finite 
out-component is if°"*(l), so the probability that a randomly chosen node is in 
the GIN is 1 — iJ°"*(l). The expected size of the out-component of a randomly 
chosen node is iJ°"*'(l). Taking derivatives in equation ^ yields 

H°"*'{i) = 1 + {kd) *'(i) + (4) 

Taking derivatives in equations (Pa|) and (j2bp and using the fact that i/J"*(l) = 
= 1 below the epidemic threshold yields a set of linear equations for 
iJ^"*'(l) and These can be solved to yield 

2 I g.(0,0.1) _ ^(0,0,1) 

rroMt/ZiX f ^ (cr\ 

f 'i(0,l:0)w, ^(0,0, 1)\ ^(0,0,1)^(04,0)' ^"^1 



(1 - G)."'^^"0(1 - G^' ' - G)"^"'^'Gi; 



and 



^ _ ^(0,1,0) ^(0,1,0) 
TTOUtl/'l\ _ f ^ fa\ 

' (l-G^°'^^°^)(l-Gi"'°'^^)-G^°^°'^'Gi°^^'°'' 

where the argument of all derivatives is (1, 1, 1). In a purely directed network, 
all derivatives involving G„ are zero, so 

i7°"*'(l) ^ 



1 - Gf 



and 



i/''""(l) = 1 + (kd) Hp'{l). 



3.2 In-components 

The in-component size distribution of a semi-directed network can be derived 
using the same logic used to find the out-component size distribution, except 
that we consider going backwards along edges. Let H™{z) be the pgf for the 
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size of the in-component at the beginning of a directed edge, H^{z) be the pgf 
for the size of the in-component at the "beginning" of an undirected edge, and 
W^{z) be the pgf for the in-component size of a randomly chosen node. Then 

- zGr{Hl^\z), 1, Hl-{z)l (7a) 
K-{z)^zGu{H:\z),1,W-{z)), (7b) 
W"{z)^zG{Hriz),l,H:\z)). (7c) 

Power series to arbitrary degrees and numerical estimates with arbitrary preci- 
sion can be obtained for H^'^{z), H^{z), and iJ™(z) by iterating these equations 
in the manner described for HJ"*{z), iJ°"*(z), and iJ°"*(z). In a purely di- 
rected network, H^{z) = 1 for all z because G„(x, y,u) — 1 for all x, y, and u. 
Thus ff;"(z) = zGr(7j;"(z), 1, 1) and i?^"(z) = zG{H';^{z), 1, 1). 

In the limit of a large population, the probability that a node has a finite 
in-component is i?™(l), so the probability that a randomly chosen node is in 
the GOUT is 1 — _ff*"(l). The expected size of the in-component of a randomly 
chosen node is iJ'"'(l). Taking derivatives in equation (ffc)) yields 

iJ™'(l) = 1 + {kd) (1) + (ku) HT{\). (8) 

Taking derivatives in equations ([7a|) and (|7b|) and using the fact that -ff*"(l) = 
H^{1) = 1 in a subcritical network yields 

, ^(0,0,1) _ ^(0,0,1) 

^ ^ > g(^'°'°^)(1 g'-°'°'^^) ' ^ 



and 



1 - 0^^'°'°^ ci^'"'"^ 



where the argument of all derivatives is (1, 1, 1). In a purely directed network, 
all derivatives involving Gu are zero, so 



i/;"'(i) 

and 



1 



^_C(1A0) 



i7™'(i) - 1 + (fc,^) i/;"'(i). 

3.3 Epidemic threshold 

The epidemic threshold occurs when the expected size of the in- and out- 
components in the network becomes infinite. Equations ([5]) and ^ show that 
the mean out-component size becomes infinite when 

(1 - g("'^'"')(1 - Gi"^o-i)) - G("^°'^^Gi"'i'°) = 0, 
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and equations ^ and (jlOp show that the mean in-component size becomes 
infinite when 

(1 - G(i'°-"')(1 - Gl"^0'i') - G("^°'i)Gii'°'°) = 0. 

From the definitions of Gf{x,y,u), Gr{x,y,u) and Gu{x,y,u), both conditions 
are equivalent to 

(1 J_g(i4.o))(i 5_G'(o,o.2)^) _ ^ (g(i.o,i)(g(oa,i) ^ 

(fed) (fed) 

Therefore, there is a single epidemic threshold where the GSCC, the GIN, and 
the GOUT appear simultaneously. 

In a purely directed network, the condition for this epidemic threshold is 
much simpler because all derivatives with respect to u and all derivatives of G„ 
are zero: The mean out-component size becomes infinite when 1 — G^^'^'^^ = 

and the mean in-component size becomes infinite when 1 — Gf^''^''^^' — 0. Both 
of these conditions are equivalent to 1 — (kd) ^ G'^'^'°-' ~ 0. 

3.4 Giant strongly-connected component 

In the limit of a large population, a node is in the GSGG if and only if its in- 
and out-components are both infinite. A randomly chosen node has a finite 
in-component with probability G(-ff™(l), 1, and a finite out-component 

with probability G(l, Hf^^l), The probability that a node reached 

by following an undirected edge has finite in- and out-components is the solution 
to the equation 

v^Guim^\l),H]^\l),v), 

and the probability that a randomly chosen node has finite in- and out-components 
is G(77™(1), Hf\l), v) [3]. Thus, the relative size of the GSCC is 

1 - G(i7r(i), 1, (1)) - G(i, i/r (1)) + G(ffr(i), h^^\i),v). 

In a purely directed network, this simplifies to 

1 - G(ii;"(i), 1) - G(i, + G(ff™(i), 
4 Simulations 

The following series of simulations provides some examples of how an epidemic 
percolation network can be derived from an SIR model with random or propor- 
tionate mixing and used to analyze it. There are three series of homogeneous 
population models and three series of heterogeneous population models. Pre- 
dictions for the mean size of outbreaks and the probability and final size of an 
epidemic were easily obtained and consistently accurate. Models were run on 
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Berkeley Madonna 8.0.1 (©1997-2000 Robert I. Macey & George F. Oster) and 
Mathematica 5.0.0.0 (©1988-2003 Wolfram Research, Inc.). 

All simulations began with a single imported infection randomly chosen from 
the population. Simulations in Berkeley Madonna used a Poisson approxima- 
tion to the number of new infections in each time step dt, with dt = .005. 
Simulations in Mathematica were based on the general stochastic SIR model 
from the Appendix: A recovery time for each infected individual was sampled 
from the appropriate distribution. When person i was infected, an infectious 
contact interval for each ordered pair ij, j i, was sampled from the appropri- 
ate distribution, and the corresponding infectious contact times were calculated. 
The minimum infectious contact time for each susceptible individual was stored. 
The next infection occurred in the susceptible with the smallest infectious con- 
tact time. The epidemic ended when the minimum infectious contact time 
among the remaining susceptibles was infinite. 

An epidemic was defined to be an outbreak that infected more than 10% or 
15% of the population. These percentages were chosen to obtain an outbreak 
size much larger than the expected size of self-limited outbreaks and much 
smaller than the expected size of an epidemic. For Rq near one, the mean self- 
limited outbreak size increases and the expected size of an epidemic decreases, 
leading to poor separation between self-limited outbreaks and epidemics. Since 
the mean size of self-limited outbreaks approaches a constant and the mean size 
of an epidemic scales with the population size, this separation can be restored 
by taking a larger population size. However, the computational time required 
for an exact simulation varies roughly with the square of the population size. 
Thus, we did not attempt simulations for Rq below 1.25 or 1.5. 

In this section and the remainder of the paper, we deal exclusively with 
purely directed epidemic percolation networks. To simplify notation, we drop 
the variable u from G{x, y, u). 

4.1 Homogeneous populations 

The first series of homogeneous population models had a fixed recovery time, 
the second series had exponentially-distributed recovery times, and the third 
series had ten different recovery time distributions. All models had a single 
imported infection randomly chosen from the population, a mean recovery time 
of one, and a basic reproductive number of Rq. 

When the recovery time is fixed, the pgf for the degree distribution of the 
epidemic percolation network is G{x, y) = e^°(^~^)e^°^^~^', so the indcgree and 
outdegree have independent Poisson distributions with mean Rq. The model 
was run at i?o = 1.25, 1.5, 2, 2.5, 3, 4, and 5. At each Rq, the model was run 
10, 000 times in a population of 10, 000 individuals. 

When the recovery time is exponentially distributed, the pgf for the degree 
distribution of the epidemic percolation network is 

^(^'^^ = l-i?o(y-l)' 
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so the indegree has a Poisson distribution and the outdegree has a geometric 
distribution. The mean indegree and outdegree are both Rq. The model was 
run at Rq = 1.5, 2, 2.5, 3, 4, 5, 6, 7, 8, 9, and 10. At each Rq, the model was 
run 10, 000 times with a population of 10, 000 individuals. 

Details of the recovery time distributions for the third series of simulations 
are shown in Table [T] As in the other homogeneous population models, the 
indegree and outdegree are independent. The pgf for the indegree is e-^-oi'^-^) _ 
The pgf for the outdegree is 



The pgf for the degree distribution of the epidemic percolation network is 
G{x,y) = e'"<'(=^-i)G°"*(y). For each recovery time distribution, models were 
run 2,000 times with a population of 1,000 individuals at Rq — 1.5, 2, 2.5, 3, 
4, and 5. 

In all homogeneous population models, an epidemic was said to occur when 
more than 10% of the population was infected. As outlined in Section 3, the 
predicted probability of an epidemic is 1 — 7J°"*(1) and the predicted final size 
of an epidemic is 1 — 7?™(1). The predicted final size of an epidemic for a 
given Rq was the same for all recovery time distributions, but Figure [2] shows 
that the probability of an epidemic depends on both Rq and the recovery time 
distribution. The probability and final size of an epidemic were equal only 
when the recovery time was fixed. For all other recovery time distributions, the 
probability of an epidemic was less than its final size (this inequality is proven 
in [5]). Figure [3] shows a good agreement between the predicted and observed 
probabilities of an epidemic and Figure [4] shows a good agreement between the 
observed and predicted final sizes of epidemics. 

4.2 Heterogeneous populations 

In the first two series of heterogeneous population models, the population con- 
sisted of two subpopulations A and B of equal size. The average number of 
infectious contacts made by a member of subpopulation B during his or her 
recovery period is A. We assumed the following infectious contact rates from 
i to j in a population of size n: |A(n — 1)""'^ when i and j are both members 
of subpopulation A, |A(n — 1)~^ when i and j are members of different sub- 
populations, and |A(7i — 1)~^ when i and j are both in subpopulation B. All 
models had a mean recovery time of one. With these assumptions, the mean 
indegree and outdegree of subpopulation A were 2A and the mean indegree and 
outdegree of subpopulation B were A, producing a positive correlation between 
susceptibility and infectiousness. 

When the recovery period is fixed, the pgf for the degree distribution of the 
epidemic percolation network is 




G{x,y) = .5e 



.2\ix~l)^2\{y-l) _^ 5gA(x-l)gA(y-l)^ 
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When the recovery period is exponentially distributed, the pgf for the degree 
distribution of the epidemic percolation network is 

Each model was run at A = .65, .7, .8, .9, 1.0, 1.1, 1.2, 1.3, and 1.4. At each A, 
the model was run 10, 000 times. For models with A > .65, the population 
was 10, 000. For A = .65, the population size was 100, 000. When A = .65, 
epidemics occur even though the mean degree of the network is less than one. 
Both series of models were implemented in Berkeley Madonna. 

A third series of simulations was conducted in populations with various mix- 
tures of recovery time distributions. The population had 1,000 individuals 
partitioned into subpopulations A and B of 500 individuals each. Each model 
was run under two scenarios: In the first scenario, subpopulation A is twice as 
infectious per unit time and has the same mean recovery time as subpopulation 
B. In the second scenario, both subpopulations are equally infectious per unit 
time but the mean recovery period of subpopulation A is twice as long as that 
of B. The degree distribution of the epidemic percolation network is identical 
under both scenarios. In the first scenario, models were run for all nine possible 
combinations of the Fixed(l), Uniform(0, 2), and Exponential 1) recovery time 
distributions. In the second scenario, models were run for all nine possible 
combinations of Fixed(2), Uniform(0, 4), and Exponential(.5) in subpopulation 
A and Fixed(l), Uniform(0, 2), and Exponential(l) in subpopulation B. These 
recovery time distributions are described in Table [TJ Every model was run 
5, 000 times with A = 2. An epidemic was defined as an outbreak that infected 
more than 15% of the population. A similar set of simulations was conducted 
where subpopulation A had a mean indegree of A and a mean outdegree of 2A 
while subpopulation B had a mean indegree of 2A and a mean outdegree of A, 
producing a negative correlation between infectiousness and susceptibility. All 
of these models were implemented in Mathematica. 

In the first two series of heterogeneous models, an epidemic was defined to 
occur when more than 10% of the population was infected. The predicted 
probability and final size of an epidemic are 1 — i/°"*(l) and 1 — _ff™(l), respec- 
tively, according to the epidemic percolation network. According to a branching 
process approximation, the predicted probability of an epidemic is 1 — /i°"*(l), 
where 

= zG(l,/i°"*(z)), 
and the predicted final size of an epidemic is 1 — /i*"(l), where 

/i"(z) = zG{h"'{z),l). 

Figures [5] and [S] compare the epidemic percolation network and branching pro- 
cess predictions of the probability and final size of an epidemic. These models 
have a positive correlation between infectiousness and susceptibility, so persons 
infected through indigenous transmission are more infectious than persons ran- 
domly selected from the population. Since the branching process approximation 



15 



implicitly assumes that persons infected through indigenous transmission have 
the same outdegree distribution as the general population, it consistently un- 
derestimates both the probability and final size of an epidemic. The epidemic 
percolation network consistently predicts the correct probability and final size 
of an epidemic. 

Figure [7] shows a scatterplot of observed and predicted epidemic probabili- 
ties for all heterogeneous population models. The predicted probability of an 
epidemic for an initial case randomly chosen from the population is 1 — 
The "combined" points show the observed and predicted epidemic probabilities 
for an initial case randomly chosen from the overall population. Let Gyi(a;,y) 
and Gb{x, y) be the pgf of the degree distributions of nodes in subpopulations 
A and B respectively. The predicted probability of an epidemic for an ini- 
tial case chosen randomly from subpopulation is 1 — G/i(l, and the 
predicted probability of an epidemic for an initial case chosen randomly from 
subpopulation i? is 1 — Gb(1, The "subpopulation A" points show 
the observed and predicted epidemic probabilities for an initial case randomly 
chosen from subpopulation A^ and the "subpopulation 5" points show the ob- 
served and predicted epidemic probabilities for an initial case randomly chosen 
from subpopulation B. All three sets of points are close to the diagonal, show- 
ing that epidemic percolation networks accurately predicted the probability of 
an epidemic. The predicted cumulative hazard of infection in an epidemic is 
— ln(if™(l)). Figure [5] shows a scatterplot of the observed and predicted cu- 
mulative hazard of infection in an epidemic for all heterogeneous population 
models. All points are close to the diagonal, showing that epidemic percolation 
networks accurately predicted the final size of epidemics. 

Figure [9] shows a scatterplot of the observed and predicted mean size of 
outbreaks in the third series of heterogeneous population models. The mean 
size of an outbreak started by a single, randomly chosen imported infection 
is The "combined" points show the observed and predicted mean 

outbreak size for an initial case randomly chosen from the overall population. 
The mean size of an outbreak started by an imported infection randomly chosen 
from subpopulation A is 

|-G^(i,i?;"*(z))U=i, 

oz ■' 

and the mean size of an outbreak started by an imported infection randomly 
chosen from subpopulation B is 

The "subpopulation A" points show the observed and predicted mean outbreak 
size for an initial case randomly chosen from subpopulation A, and the "subpop- 
ulation i?" points show the observed and predicted mean outbreak size for an 
initial case randomly chosen from subpopulation B. All three sets of points are 
close to the diagonal, showing that epidemic percolation networks accurately 
predicted the mean sizes of finite epidemics in these models. Models with 
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higher epidemic probabihties tend to have smaller outbreak sizes because large 
outbreaks are more likely to "explode" and become epidemics. 

5 Equivalence to classical epidemic theory 

In this section, we show that epidemic percolation networks can reproduce much 
of the standard theory of epidemics. When the indegree and outdegree are in- 
dependent, the epidemic percolation networks predict the same distribution of 
outbreak sizes, epidemic threshold, and probability and final size of an epidemic 
as the forward and backward branching process approximations. Epidemic per- 
colation networks can also reproduce results from models developed specifically 
for special topics within infectious disease epidemiology. Below, we give exam- 
ples of the derivation of results from the epidemiology of sexually transmitted 
diseases, controlled diseases, and vector-borne diseases. 

5.1 Branching processes 

Much of the mathematical theory of epidemics has been derived using a branch- 
ing process as an approximation to the initial spread of disease, where the 
"offspring" of an individual are the persons he or she infects [7,15,18]. The 
branching process approximation remains accurate until the first time at which 
an infectious individual transmits infection to a person who has already been 
infected, which happens after an arbitrarily long interval in the limit of a large 
population [15]. We will call this the "forward" branching process approxi- 
mation to the initial spread of disease, to distinguish it from the "backward" 
branching process discussed later. 

If the offspring distribution of a branching process has the pgf g{x), then 
the probability that the branching process goes extinct is the smallest solution 
in [0, 1] of the equation x = g{x). If h{z) is the pgf for the total number of 
individuals generated by the branching process (including the initial individual), 
then h{z) = zg{h{z)) [23]. 

Theorem 1 When an epidemic percolation network has independent indegree 
and outdegree, it predicts exactly the same outbreak size distribution, epidemic 
threshold, and probability of an epidemic (given the infection of a single ran- 
domly chosen node) as a forward branching process approximation to the initial 

spread of disease. 

Proof. The pgf for the number of secondary infections produced by a randomly 
chosen imported infection is G(l, y). The number of secondary infections pro- 
duced by persons infected through indigenous transmission has the pgf 

G/(l,y) = J-yG(i'0)(l,y) = 0(1,2/), 

where the second equality follows from the independence of the indegree and 
outdegree. Therefore, the initial spread of disease behaves like a branching pro- 
cess whose offspring distribution has the pgf gf{y) = G{l,y). The probability 
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of a sclf-limitcd outbreak (i.e. no epidemic) given the infection of a randomly 
chosen individual is the smallest solution in [0, 1] of y = G(l, y), which is equiv- 
alent y = gf{y). Similarly, H'j^*{z) = H°^*{z) and the pgf for outbreak sizes is 
H°"*{z) = zG(l, H""''(z)), which is equivalent to h{z) = zgf{h{z)). m 

Another important application of branching processes to SIR models with 
random mixing is the use of a "backward" branching process to predict the 
relative final size of an epidemic, in which the "offspring" for each individual i are 
the people who would make infectious contact with i if they were infectious. The 
relative final size of the epidemic is equal to the probability that the backward 
branching process never goes extinct [7]. 

Theorem 2 When an epidemic percolation network has independent indegree 
and outdegree, it predicts exactly the same relative final size of an epidemic as 
a backward branching process. 

Proof. In the epidemic percolation network, the number of offspring in the 
backwards branching process for a randomly chosen individual has the pgf 
G{x,l). The pgf for the number of offspring of persons reached by going 
in reverse along a directed edge is 

where the final equality follows from the independence of the indegree and out- 
degree. Therefore, the process of moving backwards along edges in the epidemic 
percolation network is a branching process whose offspring distribution has the 
pgf gb{x) = G{x, 1). The probability that a node is not infected in an epidemic 
is the smallest solution in [0, 1] of a; = G{x, 1), which equivalent to a; = gb{x). 
Therefore, the epidemic percolation network and the branching process predict 
the same relative final size of an epidemic. ■ 

In an SIR model whose epidemic percolation network has independent inde- 
gree and outdegree, the epidemic percolation network is a simultaneous mapping 
of the forward and backward branching processes. However, a branching pro- 
cess assumes that the offspring distribution is the same in each generation of 
infection (because the same pgf g{x?) is used for each generation). This as- 
sumption fails in an epidemic percolation network in which the indegree and 
outdegree are not independent. 

Epidemic percolation networks generalize to models with arbitrary joint de- 
gree distributions because they allow the offspring distribution of the initial node 
to be different from the offspring distribution of all subsequent generations in 
the forward and backward branching processes. If we go forward along edges 
starting from a randomly chosen node, the offspring distribution of the initial 
node has the pgf G(l, y) and the offspring distribution of nodes in all subsequent 
generations has the pgf G/(l,y). If we go backward starting from a randomly 
chosen node, then the offspring distribution of the initial node has the pgf G(a;, 1) 
and all subsequent generations have the pgf Gr{x, 1). When the indegree and 
outdegree are not independent, G{l,y) ^ G/(l,t/) and G(a;, 1) ^ Gr{x,l), so 
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both branching process approximations break down. We find it useful to think 
of the equations for the component size distributions in Section 3 as describing 
(forward or backward) branching processes in which the initial node is allowed 
to have a different offspring distribution from all subsequent generations. 

By mapping the forward and backward infectious contact processes simul- 
taneously, the crucial role of the GSCC in the emergence of epidemics becomes 
clear. In a forthcoming manuscript, we analyze a proportionate mixing model 
with three subpopulations: One with the greatest probability of being in the 
GIN, one with the greatest probability of being in the GOUT, and one with the 
greatest probability of being in the GSCC. Vaccinating nodes in the subpopula- 
tion most likely to be in the GSCC is shown to be the most efficient strategy for 
reducing both the probability and final size of an epidemic despite the fact that 
such nodes were of average infectiousness and susceptibility. We have obtained 
similar results with network-based models. Nodes with a high probability of 
being in the GSCC are the "core group" that sustains transmission of infection 
in the population. If the forward and backward infectious contact processes 
are treated separately, the notion of the GSCC is lost. 

5.2 Other results of epidemic theory 

The use of probability generating functions on epidemic percolation networks 
allows many classical results from the theory of epidemics to be re-derived very 
easily. Below, we give derivations of results from three different areas of in- 
fectious disease epidemiology. The ability of epidemic percolation networks to 
encompass these results in a single conceptual framework is a striking demon- 
stration of their utility and generality. 

Example 1 (Sexually transmitted diseases) For many sexually transmit- 
ted diseases, variation in levels of sexual activity affect the dynamics of disease 
transmission. One important result is that Rq for sexually transmitted diseases 
depends on both the mean and the variance of the number of sexual partners [18]. 
Let I be a random variable representing the expected number of sexual partners 
a person has during his or her recovery period. If the hazard of infection is 
proportional to I, then 

T 

Ro = — [nj + CTi] , 

where fii and (t| are the mean and variance of the number of sexual partners 
and T is the probability of transmission for each partnership [7, 18]. 

We first partition the population into subpopulations 1,2, ... such that sub- 
population i consists of all persons with I = i. The proportion of the population 
in subpopulation i is equal to P{I = i). Since there is a constant probability 
T of transmission for each partnership and / = i is the expected number of 
partners during the recovery period, the mean outdegree of subpopulation i is 
Ti. Since the hazard of infection in subpopulation i is also proportional to 
i, the mean indegree of subpopulation i must also be Ti. The indegree and 
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outdcgrcc of each individual arc conditionally independent given /, so the pgf 
for the degree distribution of nodes in subpopulation i can be written 

Gi{x,y)=Gr{x)Gr'{y). 
The pgf of the degree distribution of the epidemic percolation network is 

G{x,y) = J2P{I = i)Gt{x)Gr'iy), 

i 

The overall mean degree is T/x/, and the epidemic threshold is 

J-G(i'^)(l, 1) = - E i'Pil = i) = - [m? + . 

Therefore, the epidemic percolation network correctly predicts the epidemic 
threshold for this model. However, it can also predict the size distribution of 
outbreaks and the probability and final size of an epidemic. Epidemic per- 
colation networks can also be used to analyze SIR models with more complex 
relationships between sexual activity, infectiousness, and susceptibility. 

Example 2 (Outbreaks of controlled diseases) For diseases that have been 
eliminated within a specific country hut are not eradicated worldwide (such as 

measles in the United States), imported cases and cases secondary to importa- 
tion can still occur. To evaluate the success of a disease elimination program, 
it is important to determine whether the observed paMern of outbreaks is consis- 
tent with sustained indigenous transmission. This can be inferred from the final 
size distribution of outbreaks. If infectiousness and susceptibility are indepen- 
dent and cases generate secondary cases according to a Poisson distribution with 
mean Rq <1, all epidemics are finite and epidemic sizes follow a Borel- Tanner 
distribution [24, 25]. The probability that an outbreak has a final size of k is 

^0 ^ . (11) 

The pgf for the Borel-Tanner distribution is the unique solution to the equation 



h{z) = ze^°^''^'^-^l (12) 

Below the phase transition, the pgf for the distribution of outbreak sizes in 

the epidemic percolation network is _ff°"*(z). Since the incoming and outgoing 
degree are independent and the outgoing degree is Poisson distributed with 
mean Rq, 

G/(l,2/)=G(l,j/)=e^°(^-i) 

But then 

if;"*(^) = H'"'\z) = ^e^oCif'-'Cz)-!)^ (13) 
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which is identical to equation (fT^ . Therefore, H°'^*{z) = h{z), so the out- 
component sizes in the epidemic percolation network have a Borel- Tanner dis- 
tribution. Using the fact that the probability of having an outbreak of size one 
is e~^°, it is easy to check that the first few iterations of equation p3p produce 
coefficients of the form in equation (fTT|) . 

Example 3 (Vector-borne diseases) The model of malaria developed by Ross 
and Macdonald consists of humans and mosquitoes. Infected humans recover 
from malaria at a constant rate 7, so the average recovery period is . There 
are m susceptible mosquitoes that bite with a rate a and are infected with proba- 
bility c when they bite an infectious human, so each infectious human infects an 
average of {amc)j~^ susceptible mosquitoes. The mortality rate of mosquitoes is 
jjL, so they survive for an average of time units after being infected. When 
an infectious mosquito bites a susceptible human, the human is infected with 
probability b, so each infectious mosquito infects an average of {ab)ii~^ humans. 
The epidemic threshold in this model is defined by 

^ ma^bc 



This was one of the earliest applications of the basic reproductive number [18]. 

The full epidemic percolation network for a vector-borne disease would in- 
clude nodes representing humans and vectors. Humans infect vectors and 
vectors infect humans, so every edge in this epidemic percolation network links 
nodes of different types. Such a network is called a bipartite network. Probabil- 
ity generating functions can be used to analyze undirected bipartite networks [2] , 
and these methods can be adapted to directed bipartite graphs. Using the 
subscript h for host and v for vector, let Gh{x,y) be the pgf for the degree dis- 
tribution among humans and let Gy{x,y) be the pgf for the degree distribution 
among vectors. The epidemic percolation network among humans can then be 
constructed by drawing an edge from person i to person j if there is a vector 
that transfers infection from i to j. The pgf for the degree distribution in the 
human-human epidemic percolation network is 

Gnix, y) = 5] 5]p^,(G,,(:r, 1)F (G,/(l, y)f = G„(G„,(a;, 1), G,/(l, y)), 

j=0 fe=0 

where p^^, is the probability that a human node in the human- vector epidemic 
percolation network has j incoming edges and k outgoing edges, Gvr{x,y) is 
the pgf for the degree of a vector reached by going backwards along an edge, 
and Gvf{x,y) is the pgf for the degree of a vector reached by going forward 
along an edge. The outbreak size distribution, the epidemic threshold, and the 
probability and final size of an epidemic among humans can be predicted using 
GH{x,y). 

In the Ross-Macdonald model, infectiousness and susceptibility are indepen- 
dent among both humans and mosquitoes. Therefore, G;i/(1, y) = Gh{l, y) and 
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Gyf{l, y) = Gy{l, y). The pgf for the out-degree of human nodes is 



1 -amcy ^{y - 1)' 

and the pgf for the out-degree of mosquito nodes is 

poo 1 

Since the indegree and outdegree are independent in the human-to-human epi- 
demic percolation network, 

GHf{l,y) = Gh(1,2/) = Gft(l,G,/(l,2/)). 
Taking the derivative of GHf{^, y) at y = 1, 

G'^f {1,1) = Gr\l,l)G^:/\l,l) = ^.^, 

and we see that the epidemic threshold occurs when = 1, which is identical 

to the threshold derived by Ross and Macdonald. 



6 Discussion 

For the epidemic models considered in this paper, methods of finding the exact 
distribution of outbreak sizes for a homogeneous population of any fixed size n 
exist [7, 15, 17]. However, these methods involve solving a recursive system of 
n equations. By performing these calculations in the limit of a large popula- 
tion, the methods presented in this paper allow a much simpler derivation of 
the distribution of self-limited outbreak sizes, the epidemic threshold, and the 
probability and final size of an epidemic. Our methods also generalize much 
more easily to heterogeneous populations. 

As proven in the Appendix, the problem of analyzing the final outcomes of 
any time-homogeneous stochastic SIR model can be reduced to the problem of 
analyzing the components of an epidemic percolation network. In [5] , we showed 
how epidemic percolation networks can be used to analyze network-based models 
of the type studied by Newman [1]. In this paper, we showed that epidemic 
percolation networks can be used to analyze stochastic SIR models with random 
and proportionate mixing. In the limit of a large population, the epidemic 
percolation network for these models is purely directed. Using the probability 
generating function for its degree distribution, we accurately predicted the mean 
size of outbreaks and the probability and final size of epidemics for a variety of 
models in homogeneous and heterogeneous populations. 

The ability of epidemic percolation networks to analyze both network-based 
and fully-mixed epidemic models makes them a simple but powerful generaliza- 
tion of earlier methods of analyzing stochastic SIR models. We showed that 
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epidemic percolation networks with independent indegree and outdegree are 
equivalent to forward and backward branching processes, and we used epidemic 
percolation networks to re-derive classical results from sexually transmitted dis- 
eases, vector-borne diseases, and controlled diseases. Epidemic percolation 
networks may also provide a novel and useful qualitative insight into the con- 
trol of epidemics. The emergence of epidemics corresponds to the emergence of 
the GSCC in the epidemic percolation network, so nodes with a high probability 
of being in the GSCC may be important targets for interventions designed to 
reduce the probability and final size of an epidemic. 
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A Epidemic percolation networks 

It is possible to define epidemic percolation networks for a much wider range of 
stochastic epidemic models than that from the Introduction. First, we specify 
an SIR epidemic model using probability distributions for infectious periods 
in individuals and times from infection to infectious contact in ordered pairs of 
individuals. Second, we outline time-homogeneity assumptions under which the 
epidemic percolation network is defined. Finally, we define infection networks 
and use them to show that the final outcome of the epidemic model depends 
only on the set of initial infections and the epidemic percolation network. This 
discussion is adapted from that of our previous paper [5] . 

A.l Model specification 

Suppose tlicax; is a closed population in which every susceptible person is as- 
signed an index zG{l,...,n}. A susceptible person is infected upon infectious 
contact, and infection leads to recovery with immunity or death. Each person 
i is infected at his or her infection time ti, with ti — oo ii i is never infected. 
Person i is removed (i.e. recovers from infectiousness or dies) at time ti -\- ri, 
where the recovery period r; is a random variable with the cumulative distribu- 
tion function (cdf) Fi{r). The recovery period may be the sum of a latent 
period, when i is infected but not yet infectious, and an infectious period, when 
i can transmit infection. We assume that all infected persons have a finite 
recovery period. Let S{t) = {i : ti > t} be the set of susceptible individuals at 
time t. Let < i(2) < ... < t(„) be the order statistics of ti, ...,tn, and let ik 
be the index of the fc*^ person infected. 

When person i is infected, he or she makes infectious contact with person 
j ^ i after an infectious contact interval Tij. Given r^, each has a conditional 
cdf Fij{T\ri). Let = oo if person i never makes infectious contact with 
person j, so Fjj(r|ri) may have a probability mass concentrated at infinity. 
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Person i cannot transmit disease before being infected or after recovering from 
infectiousness, so Fij{T\ri) = for all r < and Fij{T\ri) is equal to the 
conditional probability of transmission from i to j given rj for all r G [r,,oo). 
The infectious contact time tij = ti + Tij is the time at which person i makes 
infectious contact with person j. If person j is susceptible at time tij, then i 
infects j and tj = tij. If tij < oo, then we must have tj < tij because person j 
avoids infection at tij only if he or she has already been infected. 

For each person i, let his or her importation time toi be the first time at 
which he or she experiences infectious contact from outside the population, 
with t()i = oo if this never occurs. Let i^o(to) be the cdf of the importation 
time vector tg = (<oi,io2, •••,^0r^)• 
A.2 Epidemic algorithm 

Before an epidemic begins, an importation time vector to is chosen. The 
epidemic begins with the introduction of infection at time = mini(ioi). 
Person ii is assigned an recovery period r^^. Every person j G 5'(i(i)) is 
assigned an infectious contact time ti^j = + n^j. We assume that there 
are no tied infectious contact times less than infinity. The second infection 
occurs at f(2) = ™i%es(t(i)) niiii(ioj7 Uij), which is the time of the first infectious 
contact after person ii is infected. Person Z2 is assigned an recovery period 
. After the second infection, each of the remaining susceptibles is assigned 
an infectious contact time ti^j = t(2) + Ti^j. The third infection occurs at 
t(3) = minjgs(j:j2j) min(foj, fjij, tjjj), and so on. After k infections, the next 
infection occurs at = mmj^s(^t^k))^^^i^ojitiij, ■■■,tikj)- The epidemic 

stops after rn infections if and only if = oo. 

A. 3 Time homogeneity assumptions 

In principle, the above epidemic algorithm could allow the distributions of the 
recovery period and outgoing infectious contact intervals for individual i to 
depend on all information about the epidemic available up to time ti. In order 
to generate an epidemic percolation network, we must ensure that the joint 
distribution of recovery periods and conditional transmission probabilities for 
all ordered pairs of individuals are defined a priori. In order to do this, we 
place the following restrictions on the model: 

1 . We assume that the distribution of the recovery period vector r = (r i , r2 , . . . , r„ ) 

does not depend on the importation time vector to, the contact interval 

matrix t = [t^], or the history of the epidemic. 

2. We assume that the distribution of the infectious contact interval matrix 
r does not depend on to or on the history of the epidemic. 

With these time-homogeneity assumptions, the cumulative distribution func- 
tions F(r) of recovery periods and F{T\r) of infectious contact intervals are 



26 



defined a priori. Given r and t, the epidemic percolation network is a semi- 
directed network in which there is a directed edge from i to j iS Tij < oo and 
Tji = OO, a directed edge from j to i iff Tij = oo and Tji < oo, and an undirected 
edge between i and j iff Tij < oo and Tji < oc. The entire time course of the 
epidemic is determined by r, r, and tg. However, its final size depends only 
on the set {i : toi < 00} of imported infections and the epidemic percolation 
network. In order to prove this, we first define the infection network, which 
records the chain of infection from a single realization of the epidemic model. 

A. 4 Infection networks 

Let Vi be the index of the person who infected person i, with Vi = for imported 
infections and f ^ = 00 for uninfected nodes. If tied finite infectious contact times 
have probability zero, then Vi is the unique j such that tji = ti. If tied finite 
infectious contact times are possible, then choose Vi from all j such that tji = ti. 
The infection network is a network with an edge set {vii : < Vi < 00}. It 
is a subgraph of the epidemic percolation network because r^.i < 00 for every 
edge Vii. Since each node has at most one incoming edge, all components of 
the infection network are trees. Every imported case is either the root node of 
a tree or an isolated node. 

The infection network can be represented by a vector v = (wi, .., i;„), which 
is identical to the "infection network" defined by Wallinga and Teunis [26]. 
If Vj =0, then its infection time is specified by to- If j was infected through 
transmission within the population, then it is connected to an imported infection 
impj in the infection network and its infection time is 



where the edges ej,...,ej^ form a directed path from impj to j. This path 
is unique because all nontrivial components of the infection network are trees. 
The infection times of all other nodes are infinite. The removal time of each 
node i is ti + ri. Therefore, the entire time course of the epidemic is determined 
by the importation time vector to, the recovery period vector r, the infectious 
contact interval matrix r. 

A. 5 Final outcomes and percolation networks 

Theorem 3 In an epidemic with recovery period vector r and infectious contact 
interval matrix t, a node is infected if and only if it is in the out- component 
of a node i with toi < 00 m the epidemic percolation network. Equivalently, a 
node is infected if and only if its in- component includes a node i with toi < 00. 

Proof. Suppose that person j is in the out-component of a node i with toi < 00 
in the epidemic percolation network. Then there is a series of edges ei, ...em 



m 
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such that the initial node of ei is i, the terminal node of e„i is j, and Tg^ < oo 
for all 1 < A: < m. Person j receives an infectious contact at or before 



so tj < t* < oo and j must be infected during the epidemic. To prove the 
converse, suppose that tj < oo. Then there exists an imported case i and a 
directed path with edges ei, from i to j such that 



Since tj < oo, it follows that all Tej, < oo. But then each Cfe must be an edge 
with the proper direction in the epidemic percolation network, so j is in the 
out-component of i. ■ 

By the law of iterated expectation (conditioning on r), this result implies 
that the probability distribution of outbreak sizes caused by the introduction 
of infection to node i is identical to that of his or her out-component sizes 
in the probability space of epidemic percolation networks. Furthermore, the 
probability that person i gets infected in an epidemic is equal to the probability 
that his or her in-component contains at least one imported infection. In the 
limit of a large population, the probability that node i is infected in an epidemic 
is equal to the probability that he or she is in the GOUT and the probability 
that an epidemic results from the infection of node i is equal to the probability 
that he or she is in the GIN. 





fc=l 



B Tables and figures 
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Figure 1: "Bowtie" diagram showing the giant components, tendrils, and tubes 
of a supercritical semi-directed network. Adapted from Broder et al. [8] and 
Dorogovtsev et al. [9]. 
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Figure 2: Probability of an epidemic in a homogeneous population as a function 
of Rq for recovery time distributions from Table [1] 
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Figure 3: Scatterplot of observed and predicted epidemic probabilities for all 
homogeneous population models, with the linear regression equation and B?. 



31 



Epidemic final sizes 
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Figure 4: Predicted and observed final sizes of epidemics as a function of Ro for 
all homogeneous population models. The predicted final size was the same for 
all recovery time distributions. 



32 



0.6 



0.7 



0.8 



0.9 



1.1 



1.2 



1.3 



1.4 



1.5 



Fixed: Average of 10,000 simulations 
■ Fixed: Percolation network prediction 
Fixed: Branching process prediction 



Exponential: Average of 10,000 simulations 
-Exponential: Percolation network prediction 
Exponential: Branching process prediction 



Figure 5: Predicted and observed probabilities of an epidemic as a function of 
A for the first two series of heterogeneous population models. 
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Figure 6: Predicted and observed final sizes of an epidemic as a function of A 
for the first two series of heterogeneous population models. The final sizes of 
epidemics are the same for both fixed and exponentially-distributed recovery 
times. 
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Figure 7: Scatterplot of observed and predicted epidemic probabilities in the 
third series of heterogeneous population models. The " subpopulation A" and 
"subpopulation B" points show conditional epidemic probabilities given an ini- 
tial case in subpopulation A and B respectively. The "combined" points show 
the epidemic probability when the initial case is randomly chosen from the en- 
tire population. The linear regression equation and are shown separately 
for all three sets of points. 
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Figure 8: Scatterplot of the observed and predicted cumulative hazard of infec- 
tion in an epidemic. Linear regression equations and B? are shown separately 
for homogeneous and heterogeneous population models. 
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Figure 9: Scatterplot of observed and predicted mean sizes of outbreaks in the 
third series of heterogeneous population models. Linear regression equations 
and E? arc shown separately for an initial case randomly chosen from subpop- 
ulation A, subpopulation B, and from the overall population. 
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Distribution 


Density function 


Support 


Variance 


P(t < .5) 


Uniform(.5, 1.5) 


1 


.5 < t < 1.5 


.0833 





.4+Gamnia(3, .2) 


62.5(i - .4)^e-5(*-4) 


.4 < t < oo 


.12 


.0143877 


.5+Exponcntial(2) 


2g-2(t-.5) 


.5 < t < oo 


.25 





Uniform(0, 2) 


.5 


< i < 2 


.333 


.25 


Gamma(2, .5) 


4te-^* 


< i < 00 


.5 


.264241 


Exponential(l) 


e-* 


< t < 00 


1 


.393469 


LogNormal(— .5, 1) 


1 g-.5(.5+lnt)^ 

t^/2^v 


< t < oo 


1.71828 


.423422 


ChiSquarc(l) 


1 g-.5i 


< f < oo 


2 


.5205 


Weibull(.5, .5) 




< i < 00 


5 


.632121 


Pareto(.5,2) 




.5 < t < 00 


00 





Uniform(0,4)* 


.25 


< t < 4 






Exponcntial(.5)* 


.5e--^* 


< t < oc 







Table 1: Recovery time distributions for the third series of homogeneous pop- 
ulation models. From top to bottom, they are in order of increasing variance. 
The bottom two distributions are used only in the third series of heterogeneous 
population models. 
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